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Resume 

This paper investigates the problem of selecting variables in regression-type models for 
an "instrumental" setting. Our study is motivated by empirically verifying the conditional 
convergence hypothesis used in the economical literature concerning the growth rate. To 
avoid unnecessary discussion about the choice and the pertinence of instrumental variables, 
we embed the model in a very high dimensional setting. We propose a selection procedure 
with no optimization step called LOLA, for Learning Out of Leaders with Adaptation. LOLA 
is an auto-driven algorithm with two thresholding steps. The consistency of the procedure 
is proved under sparsity conditions and simulations are conducted to illustrate the practical 
good performances of LOLA. The behavior of the algorithm is studied when instrumental 
variables are artificially added without a priori significant connection to the model. Using 
our algorithm, we provide a solution for modeling the link between the growth rate and the 
initial level of the gross domestic product and empirically prove the convergence hypothesis. 

Index Terms — Variable Selection, Instrumental Variables, Linear Model, High Dimension, 
Sparsity. AMS : 62G08 
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1 Introduction 



Let us consider the usual linear model Y = Xa + u stated to explain the causal relationship 
of explanatory variables X on another variable Y in the specific and problematic case where the 
p-value associated to the Fisher test is small implying that the regression is not significant. This 
can happen if the covariates X are endogenous : for instance when determinant predictors are 
omitted from the model or when the covariates X are correlated with the errors u. The insertion 
of instrumental variables Z in the model may lead to consistent estimation of the coefficients a. 
In economy theory, variables reporting behavior breaks (policy and political changes, natural 
disasters) appear to be good candidates to create instrumental variables. Choosing the instru- 
ments is a serious issue. We propose to address this issue in an objective way by considering 
a huge set of potential variables Zi, . . . , Zp. Then obviously, a new problem appears : selecting 
among this huge set of potential candidates the relevant ones. When the number p of candidates 
is very large with respect to the number n of available observations, the usual classical selection 
methods fail. The goal of our paper is to present a very simple procedure called LOLA (with 
no recursive step and no optimization step) dedicated to high dimensional regression models 
and to examine its selection properties when confronted to this specific problem of selecting 
instrumental variables. 

Let us first present the specific question concerning the International Economic Growth. 
One especially important point in the empirical growth literature is to evaluate the effect of 
an initial level X of gross domestic product (GDP) per capita on the growth rate of GDP Y. 
The conditional convergence hypothesis states that, other things being equaled, countries with 
lower GDP per capita are expected to grow more than others due to higher marginal returns 
on capital stock. This convergence hypothesis should imply a negative effect. However, when 
empirical tests are performed on the simple linear model Y = ai + a2X + u, the alternative 
"Hi : a2 < 0" is rejected (see Table 2, Section 5.2). A natural idea is that other phenomena 
interfere in this relationship hiding the negative effect. Unfortunately, growth theory is not ex- 
plicit enough about the set of key factors of the growth. These factors could belong to various 
categories : policy variables (fiscal, exchange rate and trade policies), political variables (rule 
of law, political rights, ...), religious variables, regional variables, type of investment (equip- 
ment/non equipment), variables relating to the macroeconomic environment (inflation, initial 
level of GDP,...), variables accounting for the international environment (terms of trade, ...), etc. 
Let us explain for instance how Sala (1997) proceeds to select the instrument variables among 
all the potential factors. Based on previous studies, three predictors (the level of income, the life 
expectancy and the primary-school enrollment rate) are retained a priori. Next, all the possibi- 
lities of introducing three other predictors (the number of variables is then p = l + 3 + 3 = 7) 
are inspected and the regression model is estimated. A test is then built to choose among all 
the estimated regression models. The methodology explains the title of the paper : "I just ran 



2 



two millions regression" . We present here a different approach : we consider a huge quantity of 
predictors to be selected and we compute only ONE regression model but in high dimension. 

The growth rate problem is also studied in Belloni and Chernozhukov (2010) where results 
are obtained using the lasso methodologies implying optimization steps. Since their results are 
convincing, we think the methodology is promising and we compare our results to theirs. We also 
add a nonparametric point of view, which shed a new light on the construction of instrumental 
variables. Instead of adding economic variables, we just consider the data as a signal and analyze 
it as depending on two factors : the initial level X of gross domestic product plus a unknown 
signal function estimated in a nonparametric way. As in Belloni and Chernozhukov (2010), we 
use the Barro and Lee data base which contains a huge number of instruments Z = (Zi, . . . , Zp) 
including many covariates for characterizing the different countries (see Section 5 and for a 
detailed description of the data, see Barro and Sala (1995) ). Since the number of instruments 
is large p ~ 300 compared to the number of countries n ~ 100, we are in a high dimensional 
setting and we proceed as follows : 

• First, we apply the selection procedure LOLA to reduce the dimensionality of the problem 
and to extract the relevant instruments. This procedure is auto-driven which has the advantage 
to avoid the search for tuning parameters. Let us emphasize that the number S of selected 
instruments is an output of our procedure and is not imposed a priori. 

• Next, we use the selected instruments to estimate the parameter a2 describing the rela- 
tionship between X (GDP) and Y (growth rate). 

The results obtained with LOLA are excellent to verify empirically the conditional conver- 
gence hypothesis : the estimation of parameter 02 is negative and roughly speaking, the null 
hypothesis " Hq : 02 = 0" is rejected even for small significant prescribed test levels. Some de- 
terminants for the growth rate are identified and we notice differences according to the periods 
of time considered in the study. Moreover, some selected variables are the same as the instru- 
ments selected in Belloni and Chernozhukov (2010) even though we consider a larger number 
of predictors. 

Let us come back now to the more general setting Y = Xf3 + u where X = {Xi, . . . , Xp) is 
a list of p predicting variables. These models are situated in a stream of papers devoted to the 
estimation of (3 as well as the selection in a high dimensional setting. The most accomplished 
methods in this context have in common a crucial assumption called "Sparsity Assumption". 
Roughly speaking, it is assumed that, even if the model depends on a huge number of parame- 
ters, only a (very) small number of them are significant. Hence a selection step is conceivable. 
Basically, the numerous methods which are proposed in this context can be classified in three 
categories : the filtering methods retaining the best explanatory variables (using various crite- 
ria), the 'wrapping' methods (among them the greedy algorithms) operating a selection in a 
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stepwise way, and the optimization methods (such as Dantzig selector or Lasso minimisation) 
minimizing a criteria combining an empirical risk with a penaUzation term related to the number 
of retained coefficients. Although it is impossible to be exhaustive in such a productive domain, 
we cite among many others Fan and Lv (2010, 2008), Tibshirani (1996), Candes and Tao (2007), 
Bunea, Tsybakov, and Wegkamp (2007), Needell and Tropp (2009), Tropp and Guilbert (2007), 
Barron, Cohen, Dahmen and DeVore (2008), Haury, Jacob, and Vert (2010) (and apologize 
for all the papers which would deserve to be cited here). Since it is a rough common feeling 
in the applied community that these methods are generally more taylored to prediction, often 
generate instability in the selection step, and that the simplest ones (filtering) finally behave 
very honorably for selection criteria, we detail here a selection method (LOL), which situates in 
between filtering and wrapping methods, since it has only two selection steps, but remains one 
of the simplest one. This methods has been proved to have, despite its simplicity, theoretical 
optimal properties for the prediction criteria (see Kerkyacharian,Mougeot,Picard and Tribouley 
(2009) and Mougeot, Picard and Tribouley (2010) ). 

In this paper, we investigate the selection properties of this procedure. From a theoretical 
point of view, we state exponential convergence of the false positive and false negative rates 
under fairly general conditions. Then, our main focus is to study the practical performances of 
the selection procedure in the particular domain of instrumental variables. Let us describe in 
more details LOL selection procedure. Two consecutive steps of thresholding are performed. In 
each of these steps we 'kill' variables and the result of our selection is the variables which have 
successfully passed the two steps. 

• The first step is a thresholding procedure allowing to reduce the dimensionality of the 
problem in a rather rough way by a simple inspection of the "correlations", computed between 
the target variable and the predictors. 

• The least square method is then used on the linear sub-model defined by the variables 
retained after the first step. The second step of thresholding is performed on the estimations 
of the parameters of the sub-model. This step is more refined and corresponds to a denoising 
phase of the algorithm. 

This procedure is called LOL for "Learning Out of Leaders". The thresholding levels t and s 
used in both steps are the inputs of LOL algorithm and are set by the statistician. Theoretical 
results are established in terms of s and t and more precisely. Proposition 1 states the consistency 
of the LOL algorithm in the sense that the number of false detections as well as the number of 
false negative tend to zero when the number of observations tends to infinity. Some assumptions 
are obviously needed to obtain the convergence properties : the sparsity assumption (only a few 
parameters are significant even if we do not know their number and position), the significance 
assumption (the 'significant' coefficients are above a 'significant' level). Both properties are 
standard, particularly in the high dimensional setting. We address also in an experimental way 
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some problems which are not solved in the theoretical part. Among them, we focus on building 
an operational procedure which is auto driven (i.e. the levels s and t are chosen in an adaptive 
way). This procedure is called LOLA procedure for LOL completed by an A algorithm allowing 
adaptation. We provide an illustration of LOLA thanks to simulations considering the very 
common case where the predictors are Gaussian variables. We first observe that LOLA appears 
to be an extremely accurate procedure when the predictors are independent or when the number 
of predictors to be selected is small. Hence, in a second step, we relax these ideal conditions 
by considering dependent variables and increasing the number of variables to be selected. We 
observe that the results given by LOLA procedure are still very convincing. We also investigate 
the performance of LOLA in a toy instrumental setting related to the Boston Housing data. 
Finally, we address, using Baro and Lee data, the convergence hypothesis problem in economy. 

The paper is organized as follows. In Section 2, LOL algorithm is described , the general 
model and the hypotheses on the model are presented. Theoretical results on the consistency 
of the LOL procedure is stated in Theorem 1. In Section 3, we explain how to modify LOL 
into LOLA to obtain a data driven procedure and we explore practical performances of LOLA 
with some simulations. Section 4 is dedicated to explore a first toy example using a classical 
dataset (Boston Housing) with a huge additional set of simulated 'instrumental' variables. Before 
applying to real data, the purpose of this section is to verify the ability of our algorithm to 
accurately select the appropriate variables, when an important set of unappropriate variables 
are added. Finally, in Section 5 we focus on the central question and we prove the hypothesis 
convergence used in the Solow-Swan-Ramsey growth model. The proofs of the theoretical results 
are detailed in Section 6. 

2 LOL and theoretical properties 

The selection procedure need two tuning parameters which are inputs given by the user. The 
assumptions needed to obtain the theoretical results are presented in this case. The consistency 
of the procedure is established in Theorem 1. 

2.1 LOL Procedure 

The selection algorithm is denoted LOL(Ar, Y, t, s) : the inputs are the target variable Y, 
the predictor variables {Xi, . . . , Xp) and the two tuning parameters t, s, specified by the user. 
As output, the procedure provides the set of indices X of the selected variables. 



1^ LOL{X,Y,t,s) 


Input : 


target Y, regression variables X = {Xi 


, . . . ,Xp) 






tuning parameters t, s 






Output : 


set of indices of the selected regression 


variables I 
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Let us describe LOL(X, y, t, s). The regression variables X are first normalized and defined 
hy X = XD where D is a {px p) diagonal matrix -D^m '■= (^J^ \=m and aj = n~^{XjX£) is the 
empirical second moment of the predictor X£. The coherence of the matrix of the normalized 
predictors is then computed : 

1 " 

Tn = sup I - > ^ XiiXim] 

The 'correlations' (scalar products) between the target variable Y and the normalized predictors 
Xi s are then sorted by increasing order : 

|X.V|(i) > |X.V|(2) > . . . > \xJy\^p). 

The leaders are the predictors associated with the highest 'correlations', with indices belonging 
to the set 

J = {i = i,...,P, |x/y| > (|x.V|(L^-ij) vt)} (1) 

where t is one of the input of LOL algorithm . Denoting the extracted matrix Xjhy 

{Xj)i^i = Xii for any J and i G {1, . . . , n} 
the Ordinary Least Square (OLS) estimator for the pseudo linear model 

V£=l,...,p, ^,= (^(Xj'x})-^Xj'y)i{i ej} 

the set of indices of the selected predictors is finally given by 

I ={£ = !,..., p, l^^l >s} 
where s is the second parameter input of the LOL algorithm. 

2.2 Model assumptions and selection criteria 

We consider a Gaussian (or sub-gaussian) high dimensional linear model. More precisely, we 
assume that the target variable Y and the p predictors X = (Xi, . . . ,Xp) are linked through 
the linear regression model 

y = X/5 + e 

where /3 G is an unknown parameter and the vector e = (ei)i=i,...,n is a vector of independent 
Gaussian variables A^(0,r7^). In the selection problem, it is generally assumed that only a few 
coefficients are non zero. The set of non zero coefficients 
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is precisely the set to be estimated. A sparsity condition is introduced to enforce the cardinahty 
of X to be less than S. In practice such kind of assumption is not realistic so a relaxed setting is 
here considered : only a few coefficients are larger than a first threshold s„ and the significant 
coefficients (the ones we definitely want to detect) are larger than a second threshold t„. More 
precisely, we assume : 

— Sparsity Conditions : There exist > and thresholds levels t„ and s„ such that 

#{££{!,... \ai^i\ > sj2} <S and V |(t,/3,|2 I{\aeP,\ < 2t„} < 

^-^ n 

1=1 

(2) 

ai are the normalizing factors defined in Section 2.1. 

— Size Condition : There exist a positive constant M such that 

p 

J^|cj,/3,|<M (3) 

(.=1 

— Significance : There exists a sequence //n > such that the set X of coefficients to be 
detected is defined by 

I :={£g {!,..., p}, |c7,/3,| >/i„} (4) 

Recall that a selection procedure whose output is denoted by X is said to be consistent if 

P[X = X) is tending to 1 when the number n of observations is tending to infinity. More 

precisely, to evaluate the quality X, the number of False Positive FP and the number of False 

Negative decisions FN are generally computed : 

p _ p _ 

FP ■.= ^I{l^X}I{leX} and FN ■.= ^I{i eX}I{i ^X}. 
e=i 1=1 

2.3 Performances 

The performances of LOL algorithm are depending - and this is a common feature in the 
regression problem - on the regression matrix X and particularly on the coherence Tn of the 
{p X p) matrix {D^XXD) defined as 

Tn = sup I [DX'XD)iJ^ 

where D is the (p x p) diagonal matrix Dum '■= I^=m a-nd aj := ^ SiLi -^u empirical 
second moment of variable X^. This quantity is important because it induces a bound on the size 
of the invertible sub-matrices built with the columns of (D^XXD) and thus on the maximum 
number of leaders used by our algorithm. For more details, we refer to Mougeot, Picard and 
Tribouley (2010). 

The consistency results are first stated for general threshold levels tn,Sn- 
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Theorem 1. Suppose that p < exp(an) for some constant a > 0. Choose the thresholds tn,Sn 
such that tn > Sn^TnV y^log p/n. Recall that X is the output of the algorithm LOL(X, Y, tn, Sn). 
Then 

p(x = X^ > 1 - exp {-cnsl) 

as soon as the vector (3 verifies the sparsity conditions (2) as well as the size and significance 
conditions (3), (4) for 

IJ-n > O (^in V TnSn \/n V s„ log r„|^ and S < 0{nsl). 

Theorem 1 is a consequence of the following proposition which gives distinct evaluations 
of the errors induced by the false positive detections and the false negative detections. It is 
interesting to observe that slightly different assumptions are needed for both detections : for 
instance, no explicit condition for is needed to insure the convergence of the rate of False 
Negative detections. 

Proposition 1. Let k be a given positive number. Assume that p < exp(an) for some a > 0. 
- False Negative (FN) : Choose the threshold levels tn > Sn and assume that the vector 
(3 verifies the sparsity conditions (2) as well as the size and significance conditions (3), 
(4) for 



Then there exists a constant c > such that 

P{FN > k) < exp {-cknfil) . 

- False Positive (FP) : Choose the thresholds 

tn > Snyo(^^logp/n\/ r„) (6) 

and assume that the vector (3 verifies the sparsity conditions (2) as well as the size and 
significance conditions (3), (4) for S < 0{nks1). 
Then there exists some constant c > such that 

P{FP > k) <exp{-cknsl) . (7) 

Observe that the requirements on the thresholds are quite loose but clearly the performances 
of the algorithm suffer when the thresholds are too low (by weakening the exponential rates 
-see (7), (6)-). They also suffer when the thresholds are too high since the significant coefficients 
have to be above the thresholds -see (5)- and then only very large coefficients are detected. 
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A specific case of interest is for 



Tn = 0{y/logp/n). 

For instance, it can easily be shown that this case especially occurs with overwhelming proba- 
bility when the entries X are i.i.d. gaussian A^(0,cr^). This case is considered in details in the 
following simulation study. 

Corollary 1. Assume Tn = 0{y^logp/n). The algorithm LOL{X, Y, 0(y^logp/n), 0{^ylogn/n)) 
is consistent as soon as the vector (3 satisfies to the sparsity, size and significance conditions for 

IJ'TL > 0(\/logplog n/n) and S < O(logn). 

3 Adaptive LOL and practical properties 

In this part we address in an experimental way some problems which are not solved in 
the theoretical part. Among them, the most crucial question is the way to choose the tuning 
parameters t.„ and We focus on building an auto driven procedure which is illustrated with 
some simulations. 

3.1 LOLA Procedure 

The auto-driven procedure is called LOLA procedure, which is LOL( ) completed by proce- 
dure A( ) allowing the adaptive tuning of the threshold levels t„ and Sn 

LOLA(X, Y) = LOL(X, Y, t, s) 
with the following choices of the tuning parameters 

i = A{{XiY,...,XpY)) and s = A{{^i, . . . ,pp)) 

where X and /3 are defined in LOL() algorithm. The algorithm A( ) is described by 



u ^ A{Z) 


Input : 


variables Z = {Zi, . 






Output : 


level u 





and the output u is computed as follows. Let |.^|(i) < 1^1(2) ^ ■ ■ ■ ^ l^l(m) be the ordered 
sample and consider the deviance function defined by 

J n m 2 

dev(J) = 5:(|Z|,,,-M''-') + Y. (l^l(i)-M""'^ 

j = l j = J+l 
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where \Z\ and \Z\ are the empirical means of the |Z|(j-)'s for respectively j = 1, . . . , J 
and j = J + 1, . . . ,m. We choose as threshold level 

u=\Z\,f-. for J = Arg min dev(J). 
^-^ ) j=l,...m 

Notice that A() is entirely data-driven and can be roughly justified as follows : since the 
thresholds are used to select the higher responses \Z\'s (being here either the scalar product 
|X*y|'s or the estimators of the linear coefficient |/3|'s) among a set of given variables, we aim 
at splitting the set of variables, into two clusters in such a way that the higher ones are forming 
one of the two clusters. The output of the A( ) algorithm is the frontier computed between the 
clusters : it is then chosen by minimizing the deviance between classes (see also Kerkyacharian, 
Mougeot, Picard and Tribouley (2009) ). Obviously, A( ) algorithm performs better when both 
clusters are distinctly separated which is the case in our theoretical setting since the sparsity 
assumption suggests that the law of the \Z\ (in absolute value) should be a mixture of two 
distributions : one for the variables included in the model (positive mean) and one for the 
others which should be very small (zero mean). 

Observe that the A( ) algorithm has its own interest since it could be useful in many other 
nonparametric settings such as denoising or density estimation where a thresholding procedure 
is performed. For instance, the input of A( ) could be the empirical wavelet coefficients fij^k 
when local thresholding is considered. 

3.2 Illustration with simulations 

The performances of LOLA algorithm are first presented by considering a classical framework 
where the predictors are realizations of gaussian variables. Intensive studies have been performed 
and we present here the results for n = 400 and p = 2000. LOLA procedure is repeated 
K = 100 times using each time n different random observations. Observations are simulated 
from the model Y = X(3 + e where e is a gaussian vector such that the signal over noise 
ratio satisfies SNR = 5. The cardinal of the set of indices of the predictors to be selected 
I = {£ = 1, ... ,p, f3£ ^ 0} is denoted S. Three experiments are presented 

- Expl : the predictors Xi, . . . , Xp are independent and = 10 

- Exp2 : the predictors Xi, . . . , Xp are linear dependent and 5 = 10 

- Exp3 : the predictors Xi, . . . , Xp are independent and S = 50. 

The empirical coherences computed on the n— sample of the predictors X = {Xi, . . . ,Xp) 

are 

Expl Exp2 Exp3 
Tn 0.25 0.72 0.25 

and deliver the following message : the results for Exp2 should be considered with cautiousness 
since t„ is large. It is also a benefit of LOLA to compute an empirical indicator giving a warning 
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sign to the user. 

The two successive thresholding steps of the algorithm are first detailed using the three 
different examples presented above. Figure 1 illustrates the first step of LOLA for the selection 
of the leaders for one experiment chosen among all K = 100 experiments. All the scalar products 
\Xi Y\ for (. = 1, . . .p = 2000 are represented in the picture; the estimated level i computed 
with procedure A() is indicated with a horizontal line. The leaders are all the variables with 
a scalar product \Xi Y\ exceeding the threshold and are labeled with a small cross, let C denotes 
the set of leader indices. The indices belonging to X are circled to indicate the variables which 
should be rightly selected. 

Observe that the reduction of the dimensionality is drastic : N = 144 leaders are selected 
during the first step among the p = 2000 initial predictors. When the sparsity is small and when 
the predictors are independent (see Expl), the values of the scalar products of the predictors 
really involved in the model are close to the value of the coefficients (here |/?^| ~ 2). For Expl, the 
first step is fine since any variable defined in X is chosen to be a leader. As the sparsity increases 
or when the predictors become dependent as it is the case for Exp2 or Exp3, the empirical scalar 
products between the predictors and the target may have more unstable behaviors leading to 
significant coefficients falling below the threshold t and as a consequence not selected as leaders 
during the first thresholding step. For Exp3, three variables pq € {612, 790, 1338} which should 
be kept are eliminated at the first step and are definitively lost for selection. 

Figure 2 illustrates the effect of the second thresholding of LOLA procedure for the same 
experiments for Figure 1. 

The horizontal lines represent the levels is where s is the output of procedure A( ). Figure 
2 gives the estimated coefficients /3f for I ^ L computed with the OLS method restricted to the 
sub-set of predictors selected as leaders. The circle is the label for the coefficients jin with £ G X 
which are kept : this label allows to see the number TP of true positive detections. Triangular is 
the label for the coefficients fi^ with £ G X but which are below the level s (in absolute value) : 
this label allows to see the number FN of false negative detections. Diamond is the label for 
the coefficients fii with I ^ X which are kept by LOLA : this label allows to see the number 
FP of false positive detections. The results for Expl are excellent : both clusters of coefficients 
(3i are well separated and A( ) algorithm performs in this situation very accurately ; we obtain 
FN=FP=0 and we get exactly X = X. For Exp3, the separation between both clusters is not 
so straight and miss detections (triangular pattern) as false detections (cross not circled) are 
observed : FN= 8 and FP= 7. 

The global performances obtained with K experiments are presented in Table 1. For each 
column, the number of indices to be estimated is indicated into brackets. The first two columns 
focus on true detections and the last ones on false detections. Results for expl are perfect : 
LOLA algorithm selects always the right predictors and there is no error. When the sparsity S 
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increases, the results are less powerful : some predictors (FN= 13.9) which should be selected 
by LOLA are finally not selected. Nevertheless, we observe that the number of false positive 
detections is again small (FP= 1.2). 

3.3 Conclusion and Comments 

In order to be concise in this presentation, the above illustrations focused only on gaussian 
random variables with SNR = 5. It should be stressed that an extensive simulation study has 
been conducted in parallel in order to evaluate the performances of LOLA. Various distributions 
for the predictors and different SNR have been implemented and studied. As conclusions to this 
experimental design, it can be stressed that for a given number of observations and potential 
predictors, the selection is more accurate for a low sparsity level. The number of false negatives 
and false positives tend to be null as the number of observations increases for a fixed number 
of predictors and/or the sparsity level decreases for a fixed number of predictors. It should be 
noted that very similar results are obtained if the Gaussian predictors are replaced by uniform, 
Bernoulli random variables or mixture of the above distributions. 

As can be seen in Corollary 1, LOL procedure is consistent under a condition on the cohe- 
rence Tn < O ^ Y^logp/n^ . This condition is verified with overwhelming probability for instance 
when the entries of the matrix X are independent and identically random variables with a sub- 
gaussian common distribution but the results obtained in Exp2 show that the procedure is still 
working quite well even if this hypothesis is not satisfied. This fits a common fact in high di- 
mensional setting, which is that the theoretical results are often more pessimistic than the true 
performances of the procedures. Moreover, before running the procedure, it should be noticed 
that the computation of Tn brings some benefit as an indication of potential misbehaviors. 

4 LOLA properties in a toy 'instrumental' setting 

We begin by studying the practical quality of our algorithm with real data combined with 
simulated instrumental data by revisiting the Boston Housing data set available from the UCI 
machine learning data base repository : http ://archive.ics.ucfi.edu/ml/. One of our goal is here 
to evaluate the ability of LOLA procedure to accurately select the right predictors even if the 
original variables are embedded in a huge space built with artificial variables. This analysis also 
help to point out which kind of variables are wrongly selected from the complementary artificial 
space and which one are not selected from the initial space. 

The original Boston Housing data are defined by one continuous target variable Y (the 
median value of owner-occupied homes in USD) and po = 13 predictive variables observed for 
n = 506 observations. For our purpose, these original data are embedded in a high dimensional 
space of size p = 2100-1-13 by adding artificially 300 independent random variables of 7 different 
laws : normal, lognormal, bernoulli, uniform, exponential. Student and Cauchy in equal pro- 
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portion. This set of distributions is chosen to mimic the different empirical laws of the po = 13 
original predictive variables. In order to numerically evaluate the performances of LOLA, the 
procedure is called K = 100 different times. Each time, a new set of artificial 'instruments' has 
been simulated. Also, to evaluate the instability of the algorithm, each time the procedure has 
been performed on a 0.75n sub-sample of the initial n- sample, randomly chosen. 

Since the initial data are observed (and not simulated), we do not know in advance which 
variables should be selected in the model. In order to evaluate the performances of LOLA algo- 
rithm, we considered two benchmark procedures : a) the classical multiple regression following 
by variable selection using a simple Student test, b) the stepwise regression method (significant 
level 95%). Obviously, the two benchmark procedures are performed in the regular space of data 
with pq = 13 variables and K = 100 times again by randomly choosing and 0.75 n observations 
among the original data set. 

In the high dimensional model {p = 2113), the empirical coherence is = 0.98 which is 
very high and indicates that the predictors are very dependent. We applied LOLA procedure 
to this huge set of data. The results show that the number of false detections of the artificial 
variables is extremely low : we only select 92 adding variables over 2100 random variables for a 
total of K = 100 experiments. It is interesting to observe that half of the selected variables are 
distributed according to the cauchy law and 10% are distributed according to the T(2) law. A 
complementary work shows that the impact of heavy tailed distributions for the predictors is 
similar to the impact of dependence between predictors. 

Figure ?? shows the frequencies of detection for the initial po = 13 predictors using LOLA 
procedure, OLS with Student test and stepwise regression. The results obtained with LOLA and 
OLS with Student test are similar very similar. This comparison confirms that our procedure 
performs fairly well in the presence of a huge number of artificial 'instrumental variables'. 

This preliminary investigation, as a first step, as well as the previous simulation study, 
justifies the use in the sequel of LOLA as a selection algorithm in the presence of an important 
set of instrumental variables and make stronger our conclusions of the following section. 

5 International Economic Growth 

We study in this section the problem of convergence hypothesis in economic expounded in the 
introduction. We use the Barro and Lee data available from http ://www. nber.org/pub/barro. lee. 
We present different models to evaluate the casual effect of a initial level X of gross domestic 
product on the growth rate (of gross domestic product) using parametric models and economic 
variables as well as non-parametric models. Our aim is to empirically prove that the conver- 
gence hypothesis is valid and to compare our results with the results presented in Belloni and 
Chernozhukov (2010). 

The Barro and Lee data are extracted as in Belloni and Chernozhukov (2010) and missing 
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data are removed for each studied case. Notice that we restrict the number of countries (n) 
in the study since we want to take into consideration a large number of variables (p). It is 
important to keep this in mind for a further interpretation of the results. 

5.1 The data 

The national growth rate in gross domestic product (GDP) per capita is our dependent 
variable Y and is studied for different periods of time. The predictor X is an initial amount 
of the gross domestic product (GDP) per capita. More precisely, the variables are defined by 
Y = log (GPDt^/GPDtJ and X = log(GPDtJ for the following periods of time : 



The Barro and Lee data contain different economical indicators from 1960 to 1985. Six 
categories of variables are considered : Education (1), Population/Fertility (2), Governement 
Expendidures (3), PPP deflators (4), Political variables (5), Trade Policy and others (6). 

In view to measure the accuracy of LOLA, as well as to increase the stability of the algorithm 
and also to see the impact of the sample of countries on the selection and estimation, we run 
the procedure K = 1000 times using each time on a portion of O.SSn data randomly chosen 
from the initial data set. All the given results (the estimator Q2, the bounds of the confidence 
interval computed under the gaussian hypothesis, the and the p-value associated to the 
global Fisher test) are then averaged for the K = 1000 experiments. The empirical standard 
deviation normalized by ^/K is given into brackets. The indicator A'^o is the frequency of the 
even "zero belongs to the confidence interval". Ideally, A'o = 0. The confidence intervals are 
computed for a coverage of 90% which leads to test the null hypothesis Hq : 02 = against 
Hi : 02 < at the level 5%. 

5.2 LOLA in a parametric setting 

First, it should be noticed thanks to the results obtained by the standard OLS given in 
Table 2 that the linear model Y = ai + a2X + n is irrelevant. For all periods of time, the 
is almost zero and the p-value leads to reject the significance of the model. Since Nq = 1, the 
hypothesis Hq : 02 = is always accepted at level 5%. 

We now use LOLA procedure to select the subset of explanatory variables -^selected contai- 
ning the maximum of information in order to explain X or Y. Taking inspiration into the vast 
literature about instrumental variables in econometrics (see among many others Angrist, Imbens 



Period [ti, t2] Initial Date it- P 



Expl 
Exp2 
Exp3 
Exp4 



1965-75 1960 63 208 

1965-75 1965 63 208 

1975-85 1970 52 375 

1975-85 1975 52 375 
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and Rubin (1996), Blundell and Powell (2003), Florens (2003), Darolles, Fan, Florens, Renault 
(2010) and Florens, Heckman, Meghir, Vytlacil (2003) in a nonparametric or semi-parametric 
framework), we select the instruments using the two following models 

Model 1 : X = Zl3 + error and Model 2 : Y = ai + + Z/3 + error. 

Depending on the situation, each of them has its own justification and interest : Model 1 is used 
in the first step of the Instrumental Variable method while Model 2 relies the endogeneity of 
X in the initial model Y = ai + a2X + u with the possibility that covariates Z are missing. 
The selected instruments -^selected (obtained either via Model 1 or Model 2) are then used as 
control variables and we consider the following model 

Y = ai + a2X + -^selected ''^ + ^rror, (8) 

where ai, 02 and /3 are estimated (OLS). For Model 1 (above) and Model 2 (below), we obtain 
the following results given in Table 3. 

The selection via Model 2 seems to be more appropriate than the selection via Model 1 : 
the p-value associated to the Fisher test is very small which leads to accept the significance 
of Model (Equation 8). We now focus on the results obtained with the selection using Model 
2. Observe that the results concerning Expl and Exp2 (respectively Exp3 and Exp4) are very 
similar : the date to of the initial amount of the GDP leads to similar results. The estimation 
of the coefficient 02 is negative and the confidence interval never contains zero (remember that 
the results are averaged on K = 1000 experiments). Results are better for Expl (and Exp2) 
than for Exp3 (and Exp4) : Nq is equals 2% for the first period of study instead of 30% for the 
second period of study. Analyzing the empirical densities of the estimator 0:2 given in Figure 4, 
we notice that for both periods of time, the support is almost included in R~ : for the period 
1965 — 1975 and to = 1960, all of the runs provide negative estimated values while for the period 
1975 - 1985 and to = 1970, 99.8% of the runs do. 

Notice that the number of selected instruments is quite reasonable (5 ~ 6 or 8) so we 
can also comment our results on a qualitative point of view. Incidentally, it is very interesting 
to identify the determinants of growth. To partially answer this question. Figure 5 shows the 
frequencies of selected variables using Model 2 when the growth rates under consideration are 
for both periods 1965-75 and 1975-85. The six vertical areas define the 6 broad categories of 
variables as given in the beginning of this part. It is interesting to notice that the sets of 
selected variables are quite different for both periods of time. For 1965-75, the most selected 
variables are indicators of the demography of the countries like "Life expectancy at age 0" (1960, 
1970, 1965-69, 1970-74) and "Total Fertility rate" (1965-69) while for period 1975-85, they are 
"Total gross enrollment ratio for secondary education" (1975), "Male gross enrollment ratio for 
secondary education" (1965), "Growth rate of population" (1965-69), "Ratio of real government 
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"consumption" expenditure net of spending on defense and on education to real GDP" (1965- 
69) and "Black market premium" (1975-79, 1980-84). This, of course can be partially due to 
the instability of the method, it may also have economical interpretations. 

5.3 Comparison with results by Belloni and Chernozhukov 

In Belloni and Chernozhukov (2010), a lasso procedure is used for the first period 1965-75. 
The method depends on a tuning parameter A : the smaller is A, the more instrument variables 
are selected and larger (in absolute value) is the estimator of a^- In the following table, we 
recall the results by Belloni and Chernozhukov (2010) when A is varying and our results when 
all the data are used (K = 1) since no stability bootstrapping has been performed in Belloni 
and Chernozhukov (2010). Observe that the LOLA procedure leads to larger estimators of 02 
(in absolute value) and confidence intervals which are better separated from zero. We verify 
that zero never belongs to the intervals of confidence obtained by LOLA procedure even if the 
level Q becomes larger than 90%. 

For 1965-75, the selected variables are "Percentage of "no schooling in the male population" 
(1970) and "Percentage of "primary school attained" in the total population and in the female 
population" (1970). These variables related to the education policy are also selected by Belloni 
and Chernozhukov (2010). It is striking that LOLA selects variables which are the same as in 
Belloni and Chernozhukov (2010). Recall that LOLA works in high dimension (here p ~ 200 
or 300) while the lasso procedure is used in the standard case where p < n {p ^ 40). For the 
second period 1975-1985, LOLA retains 

- "Percentage of primary school complete in the female population" (1960, 1965) 

- "Population Proportion over 65" (1960) 

- "Growth rate of population" (average on 1980-84) 

- "Black market premium" (average on 1975-79, average on 1980-84) 
as in Belloni and Chernozhukov (2010) and in addition 

- "Total gross enrollment ratio for secondary education" (1975, 1980) 

- "Total fertility rate" (1970, average on 1980-84) 

- "Life expectancy at age 0" (1975). 

5.4 LOLA in a non-parametric setting 

If we are only interested by the sign of the coefficient 02, it could be interesting to build a 
set of tailored instrumental variables which have no economical meaning but which provides a 
very good model 

Y = -\- ai + Zfi -|- error 

in the sense that the number S of variables Z is small and that the null hypothesis : a2 = 
is rejected. Using our own instruments has also another advantage : the study can be conducted 
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with almost all the countries (n ~ 110) while half were removed in the previous part because 
pieces of information were missing for some variables : by instance China, Singapore were not 
considered for the first period of time and many countries from Africa were missing in the study 
for the second period. 

As a consequence, let us consider the vector X as a signal, and indeed replace X by X( ) 
where X( ) is the vector such that Xj-^) < . . . < ^(n) such a way that the curve of the initial 
amount of the GDP becomes smoother. Taking inspiration from the learning theory (see Ker- 
kyacharian, Mougeot, Picard and Tribouley (2009) ), we build a set of functions called dictionary 
T) containing the functions cosinus, sinus, box and Schauder : 

^ = {'/'AlAeA = Wi,.A,.A,Ak for 1 < Ai,A2, A3 < n,0 < j < log2(n),l < k<2^ - l] 
for 

^\{^) — cos(27rAx) and ip\[x) = s\\i{2t:Xx) 
'/xix) = \a^M {x) where oa, 6a ^ ^o,n] 
ftki^) = 2^/V^(2-'2; - k) where ip^{x) = (j;l[o,o.5] (a^^)) - (^^l [0.5,1] (a^) - l) • 

For each function of the dictionary, the curve Z\ = * {ip\ {l/n) , . . . ,ipx {i/n) , . . . ,ipx (n/n)) is 
considered as an instrumental variable and the matrix Z = {Zx)x^\ contains p = ^T> predictors. 
Here, we get n = 110 and p = 300. The obtained results are excellent 

Observe that the number of selected instruments is smaller than in the parametric setting as 
well as the fit is better if we are only concerned by the explanation of y by X. The confidence 
interval for 02 almost never contain zero : see the very small values of Nq. Again, we observe 
that the initial to has no influence on the estimation of 02 and that the estimator is slightly 
smaller for the second period of time. In order to compare the non parametrical methodology 
with the parametrical methodology, we give again the empirical densities of the estimator 02 : 
here the results are excellent since their supports are strictly contained in R~ . 

Finally, if we end this section with a rapid comparison with Belloni and Chernozhukov 
(2010), using all the available data in one run, we always accept the hypothesis 02 < 0. Notice 
that the number of selected instruments is again relatively small and is comparable to the 
number founded in Belloni, Chernozhukov (2010) for A = 0.4. Again, our estimators are further 
away from zero than the lasso estimators. 

6 Proof of Theorem 1 

Theorem 1 is an obvious consequence of Proposition 1. 

The main ingredients for proving Proposition 1, are Lemma 1 and Lemma 2. These lemmas 
are proved in Mougeot, Picard and Tribouley (2010) : see the terms IBS and OBB + OBS for 
Lemma 1 and the term IBS for Lemma 2 and put ai = ai Pi and a} = ai Pi. 
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Lemma 1. Recall that J is the set of the indices of the leaders defined in (1). Assume 
that the sparsity conditions (2) are satisfied by the sequence = For any A > 

C (Sr^ + S'logp/n) where C is a computable constant depending on rj^ and X^f^j^ |<^€A|; we get 



P[j: M.P > <2exp(-c^) 



for t = 1,2, 



with 

Vti = {1=1,..., p, \ae/3i\ > 2sn, \cri/3c\ < s„} 

and 

n2 = {£ = l,..., p, £^J, \aePe\ > 2tn} 
and where c is an universal constant. 

Lemma 2. Recall that J is the set of the indices of the leaders defined in (1). Assume that 
the sparsity conditions (2) are satisfied by the sequence {(3e),i = 1, . . . ,p and choose threshold 
levels tn, Sn such that tn > Sn^ C + y^log p/ri^ . Then, for any X > C (S/n) where C is a 
constant depending on o"^ and M , we get 

P\Y. > A <exp(-c^J 

with 

= e J, \aePi\ < Sn/2, \cjSA > s„} 
and where c is an universal constant. 

Now, we are ready to prove Proposition 1. First, we focus on FN. Using (4), and for 
fin > Sn/2, we get 

= # {{i, \aiPi\ > Mn}) <S< [r-ij 
and we conclude applying Lemma 1 with > 2(s.„ V More precisely, for a given k, we get 

P{FN >k) = P{i^{xr\{xf) > k) 

= Pme, \a,P,\ / 0, \aSi\ > Sn} = 0} > k) 

p 

< PiY, WiM^ H WiM > Ain} I{ M\ HWi^il > Sn} = 0} > kfjl) 

e=i 

<P(^ \aef3e\^ > kfil/2) + P{Y, WiPi? > kfil/2) 
nkixl" 



< 2 exp — c 



2ct2 



as soon as 



k„l/i2a')>c' [M^r'ny-y^-^] S. 

n n 
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a soon as 




S 



n 
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Expl Exp2 Exp3 



Figure 1 - Empirical second moment between Y and Xi, . . . , Xp for p = 2000, n = 400, 
the predictors X's are gaussian. Expl : The predictors are independent, S = 10. Exp2 : The 
predictors are dependent, s = 10. Exp3 : The predictors are independent, S = 50. The horizontal 
line is the auto-driven level t. 
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Expl 



Exp2 



Exp3 



Figure 2 - Estimators of the linear coefficients I3g for I ^ C for p = 2000, n = 400, the predictors 
X's are gaussian. Expl : The predictors are independent, S = 10. Exp2 : The predictors are 
dependent, S = 10. Exp3 : The predictors are independent, S = 50. The horizontal lines are 
the auto-driven levels it s. 
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TN 


TP 


FN 


FP 


Expl 
Exp2 
Exp3 


1990.0 (1990) 
1989.9 (1990) 
1948.8 (1950) 


10.0 (10) 
9.2 (10) 

36.1 (50) 


0.0 (0) 
0.8 (0) 
13.9 (0) 


0.0 (0) 
0.7 (0) 
1.2 (0) 



Table 1 - Simulations performances for K = 100 replications. The true target value is given 
into brackets. 
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LOLA (p = 1213) OLS Student Selection(p = 13) STEP {p = 13) 

Figure 3 - Variables selection performances for Boston data, K = 100 
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0.2 


Confidence Interval 


iVo 




p-value for F test 


Expl 


0.012 


[-0.038 , 0.062 ] 


1.000 


0.005 


0.673 


Exp2 


0.017 


[-0.031 , 0.065 ] 


1.000 


0.008 


0.577 


Exp3 


0.051 


[-0.002 , 0.104 ] 


1.000 


0.045 


0.139 


Exp4 


0.058 


[0.006 , 0.111 ] 


1.000 


0.060 


0.085 



TABLE 2 : Estimation of the model Y = aY+OL-^X-^-u for tfie different periods of time. All the standard deviation 
values normalized by vTOOO are smaller than 10~^. 
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S 012 CI A^o p-value for F test 



Expl 


10.8 (0.1) 


-0.245 


(0.002) 


[-0.358 (0.002), 


-0.132 (0.002)J 


0.011 


(0.003) 


0.013 


(0.001) 


rjxpz 


inn ( c\ 1 \ 
iO.y (O.ij 




(U.OOz ) 


[-O.o4z (U.OOz), 


A 1 AO /A AAOM 
-O.iOy (O.OOZ)J 


A AQA 

O.Ooy 


/A AA^^^ 
(O.OOD ) 


A AOK 


/A AAO^ 

(U.UUz ) 


Exp3 


14.2 (0.1) 


-0.227 


(0.004) 


[-0.429 (0.004), 


-0.025 (0.004)] 


0.452 


(0.015) 


0.155 


(0.005) 


Exp4 


14.6 (0.1) 


-0.175 


(0.004) 


[-0.434 (0.005), 


0.083 (0.004)] 


0.742 


(0.014) 


0.255 


(0.006) 


Expl 


6.3 (0.1) 


-0.246 


(0.002) 


[-0.342 (0.002), 


-0.150 (0.001)] 


0.016 


(0.004) 


0.007 


(0.041) 


Exp2 


6.3 (0.1) 


-0.232 


(0.002) 


[-0.331 (0.002), 


-0.132 (0.001)] 


0.025 


(0.005) 


0.011 


(0.042) 


Exp3 


8.4 (0.1) 


-0.166 


(0.003) 


[-0.266 (0.003), 


-0.066 (0.002)] 


0.230 


(0.013) 


0.004 


(0.017) 


Exp4 


8.4 (0.1) 


-0.149 


(0.003) 


[-0.262 (0.003), 


-0.036 (0.002)] 


0.311 


(0.014) 


0.009 


(0.034) 



TABLE 3 : Estimation of 02 for determinants Z selected via Model 1 (above) and Model 2 (below). 
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-0.45 -0.4 -0.35 -0.3 -0.25 -0.2 -0.15 -0.1 -0.05 -0.5 -0.4 -0.3 -0.2 -0.1 O 0.1 0.2 



FIGURE 4. Expl : 1965-75, to = 1960, empirical density of &2 Exp3 : 1975-85, to = 1970, empirical density of &2 
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FIGURE 5. Selected variables for diflterent periods 1965-1975, to = I960 (bottom) and 1975-1985, to =1970 
(below). Area 1 ; Education, Area 2 : Population/Fertility, Area 3 ; Governement Expendidures, Area 4 : PPP 
deflators, Area 5 : Political variables. Area 6 : Trade Policy and others. 
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iVi 


A 




Q!2 


CI 


rjxpl 




i.ioo 


o 
z 


-U.UiD 


r n no K 
[-U.Ozo, 


-0.004J 


Expl 




yj. 1 oo 


4 


-\J.U'±-L 


[-0.054, 


-0.029] 


Expl 


BC 


0.591 


3 


-0.044 


[-0.065, 


-0.034] 


Expl 


BC 


0.473 


11 


-0.051 


[-0.065, 


-0.032] 


Expl 


M2 




3 


-0.136 


[-0.212 , 


-0.060 ] 


Exp2 


M2 




3 


-0.116 


[-0.191 , 


-0.040 ] 


Exp3 


M2 




11 


-0.351 


[-0.442 , 


-0.261 ] 


Exp4 


M2 




11 


-0.332 


[-0.443 , 


-0.221 ] 



TABLE 4. Estimation of Q2 for selection obtained via lasso procedure and LOLA for Model2. 
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s 


0:2 






AT 
JVo 


Expl 


7.964 


-U.iiu 


r n 1 /I ^; 
[-U.i4D 


n no c; 1 
, -U.UoO J 


U.UUi 


Exp 2 


one 
/ .890 


-U.iiO 


r n 1 /I /I 
[-U.i44 


n no c; 1 
, -U.UoO J 


U.UUU 


Exp3 


7.958 


-0.130 


[-0.165 


, -0.096 


0.002 


Exp4 


7.769 


-0.139 


[-0.172 


, -0.106 ] 


0.000 


Expl 


8.772 


-0.085 


[-0.099 


, -0.072 ] 


0.002 


Exp2 


8.736 


-0.083 


[-0.097 


, -0.070 ] 


0.005 


Exp3 


8.766 


-0.096 


[-0.111 


, -0.080 ] 


0.004 


Exp4 


8.773 


-0.094 


[-0.109 


, -0.079 ] 


0.002 



TABLE 5. Estimation of 02 for various periods of time and for predictors Z selected via Modell (above) and 
Model2 (below). All the standard deviation values normalized by vTOOO are smaller than 10^"^. 
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-0.14 -0.12 -0.1 -0.08 -0.06 -0.04 -0.02 -0.18 -0.16 -0.14 -0.12 -0.1 -0.08 -0.06 -0.04 -0.02 



FIGURE 6. Expl : 1965-75, to = 1960, empirical density of 02 Exp3 : 1975-85, to = 1970, empirical density of 02 
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S 


Q2 


CI 


S 


02 


CI 


Expl 


8 


-0.120 


[-0.148 , -0.092 ] 


8 


-0.086 


[-0.098 , -0.073 ] 


Exp2 


8 


-0.116 


[-0.144 , -0.089 ] 


8 


-0.083 


[-0.095 , -0.071 ] 


Exp3 


8 


-0.133 


[-0.165 , -0.100 ] 


9 


-0.100 


[-0.114 , -0.085 ] 


Exp4 


7 


-0.140 


[-0.172 , -0.108 ] 


9 


-0.093 


[-0.107 , -0.079 ] 



TABLE 6. Estimation of 02 for both periods of time and for determinants Z selected via Modell (left) and 

Model2 (right). 
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